Genres in the Prague Discourse Treebank

نویسندگان

  • Lucie Poláková
  • Pavlína Jínová
  • Jirí Mírovský
چکیده

We present the project of classification of Prague Discourse Treebank documents (Czech journalistic texts) for their genres. Our main interest lies in opening the possibility to observe how text coherence is realized in different types (in the genre sense) of language data and, in the future, in exploring the ways of using genres as a feature for multi-sentence-level language technologies. In the paper, we first describe the motivation and the concept of the genre annotation, and briefly introduce the Prague Discourse Treebank. Then, we elaborate on the process of manual annotation of genres in the treebank, from the annotators' manual work to post-annotation checks and to the inter-annotator agreement measurements. The annotated genres are subsequently analyzed together with discourse relations (already annotated in the treebank) – we present distributions of the annotated genres and results of studying distinctions of distributions of discourse relations across the individual genres.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank

The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...

متن کامل

Introducing the Prague Discourse Treebank 1.0

We present the Prague Discourse Treebank 1.0, a collection of Czech texts annotated for various discourse-related phenomena "beyond the sentence boundary". The treebank contains manual annotations of (1), discourse connectives, their arguments and senses, (2), textual coreference, and (3), bridging anaphora, all carried out on 50k sentences of the treebank. Contrary to most similar projects, th...

متن کامل

Discourse Relations in the Prague Dependency Treebank 3.0

The aim of the demo is threefold. First, it introduces the current version of the annotation tool for discourse relations in the Prague Dependency Treebank 3.0. Second, it presents the discourse relations in the treebank themselves, including new additions in comparison with the previous release. And third, it shows how to search in the treebank, with focus on the discourse relations.

متن کامل

Use of Coreference in Automatic Searching for Multiword Discourse Markers in the Prague Dependency Treebank

The paper introduces a possibility of new research offered by a multi-dimensional annotation of the Prague Dependency Treebank. It focuses on exploitation of the annotation of coreference for the annotation of discourse relations expressed by multiword expressions. It tries to find which aspect interlinks these linguistic areas and how we can use this interplay in automatic searching for Czech ...

متن کامل

Annotation of Discourse Connectives for the Prague Dependency Treebank

The paper presents a preliminary study on discourse connectives (DC) in Czech. Aiming to build a computerized language corpus capturing discourse relations in Czech, we base our observations on current foreign projects with the same purpose. In this study, first, the different methods of linguistic analysis of the discourse structure and discourse connectives are described, next, the nature and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014